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ABSTRACT 

The use of photometric redshifts in cosmology is increasing. Often, however these photo-zs 
are treated like spectroscopic observations, in that the peak of the photometric redshift, rather 
than the full probability density function (PDF), is used. This overlooks useful information 
inherent in the full PDF. We introduce a new real-space estimator for one of the most used 
cosmological statistics, the 2-point correlation function, that weights by the PDF of individual 
photometric objects in a manner that is optimal when Poisson statistics dominate. As our 
estimator does not bin based on the PDF peak it substantially enhances the clustering signal 
by usefully incorporating information from all photometric objects that overlap the redshift 
bin of interest. As a real-world application, we measure QSO clustering in the Sloan Digital 
Sky Survey (SDSS). We find that our simplest binned estimator improves the clustering signal 
by a factor equivalent to increasing the survey size by a factor of 2-3. We also introduce a 
new implementation that fully weights between pairs of objects in constructing the cross- 
correlation and find that this pair-weighted estimator improves clustering signal in a manner 
equivalent to increasing the survey size by a factor of 4-5. Our technique uses spectroscopic 
data to anchor the distance scale and it will be particularly useful where spectroscopic data 
(e.g, from BOSS) overlaps deeper photometi-y (e.g., from Pan-STARRS, DES or the LSST). 
We additionally provide simple, informative expressions to determine when our estimator will 
be competitive with the autocorrelation of spectroscopic objects. Although we use QSOs as 
an example population, our estimator can and should be applied to any clustering estimate 
that uses photometric objects. 

Key words: methods: analytical - methods: statistical - surveys - quasars: general - galaxies: 
statistics - large-scale structure of Universe. 



1 INTRODUCTION 

With the advent of deep and wide multi-band photometric surveys 
there has been a resurgence of interest in photometric redshifts as a 
means of estimating the distance to a range of astrophysical objects. 
Depending on the objects of interest and the information to hand, 
the derived photometric redshifts will be of varying precision and 
accuracy, but all can be described by a probability density function 
(PDF). As our understanding of photometric redshifts improves our 
confidence in, and ability to characterise, these PDFs, their use in 
cosmological statistical analyses is sure to increase. 

In the sense that photo-zs represent color-redshift relations, 
the use of an ense mble of PD Fs for a set of objects is a decades- 
old approach (e.g. lKoj|l999l and references therein). An exam- 
ple of this is the selection of c luster galaxies (e.g., via the Red 
Sequence; lOladders & Ye3l2000 ). Cluster galaxy selection tech- 
niques have, in fact, recently been updated to incorporate full PDFs 
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van Breukelen & Clewlevl2009l) but approaches that use full PDFs 



remam rare. 
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Gaussian PDFs to estimate luminosity functions, a problem that has 
been studied for more arbitrary PDF shapes by Chen et al. 
and iSheth (,2007). Full PDFs are particularly underutilised in clus- 
tering work, where the use of broad redshift bins is more prevalent. 
By using broad redshift bins to measure photometric clustering one 
can ameliorate uncertainties in the photo-z "peak", but typically at 
the expense of constraining power. 

One of the most fundamental statistics of any population of 
objects, and one which carries much physical informa tion, is the 
2-point correlation function (e.g. lXotsuii & Kiharalll969h . Provided 
the redshift distribution of the objects is well known, the underlying 
3D clustering can be robustly inferred from the measured clustering 
in projection ( lLimbei|[l953 1. but the number of objects required 
increases dramatically when the redshift distribution is broad. For 
this reason, estimates of the 2-point function can in principle gain 
tremendously from improved utilization of the redshift information 
associated with photometric objects. 
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Often photo-zs are derived from the information in a subset of 
the objects for which spectroscopy has been obtained. In addition 
to calibrating the photo-zs, this subset of spectroscopic objects can 
be used as distance anchors with which to set the real-space trans- 
verse scale for distances to the photometric objects. Measuring the 
cross-clustering of photometric objects around spectroscopic ob- 
jects has several advantages: the properties of the spectroscopic 
objects, such as luminosity or spectral type are precisely known; 
the photometric objects are distributed more uniformly, meaning 
their background clustering signature (the "mask") is simple to 
obtain and issues like fiber collisions and more complex hidden 
selection dependencies that might be introduced by the spectro- 
graphic setup are completely absent; the cross-correlation probes 
the clustering only in a well-defined and localised z-range, re- 
ducing the sensitivity to photometric outliers while the number of 
pairs is dramatically increased by using the higher number den- 
sity of the photometric sample to improve statistics. The use of 
spectroscopic-photometric cross-correlations to est imate clustering 
is not new (e.g.lLongair & Seldner 1979|; lYee & GreeiJ 19841 19871 ; 
IWold et alj|2000l ; iHill & LillVul9910 however, using the informa- 
tion inherent in full PDFs to improve the clustering signal in cross- 
correlation methods is in its infancy. 

In this paper we develop a clustering measure which uses 
the full photometric redshift PDF and which optimally weights 
photometric-spectroscopic pairs in the limit that the error is Pois- 
son. Our method circumvents the need to use the peak of the pho- 
tometric redshift PDF to select which objects lie in a redshift bin 
of interest, or indeed to bin objects at all. It allows every object 
that can be assigned a photometric redshift to be usefully cross- 
correlated against every spectrosopic object in the interval of in- 
terest. We also provide simple, informative equations that indicate 
when photometric redshifts are precise enough, for a given sample 
size, to provide improved constraints over the spectroscopic auto- 
correlation. We find that this condition is very hard to satisfy, which 
explains why even relatively small spectroscopic surveys can pro- 
duce clustering measurements comparable to much larger photo- 
metric samples. We additionally provide a quick method to cal- 
culate how much our optimal weighting scheme for spectroscopic- 
photometric cross-correlations can help satisfy this condition by us- 
ing full PDF information. The various equations we discuss should 
be very useful in establishing a survey design to optimise clustering 
measurements. 

To demonstrate our approach with real-world data we ap- 
ply our new method to measure the clustering of quasars (QSOs). 
The measurement of QSO clustering sheds light on both QSO 
demographics and the physics powering these systems. The am- 
plitude of clustering on large scales is related to the masses of 
the dark matter halos which host the QSOs (their environment), 
which together with th e observed number density allows QSO life- 
times or duty cvcles jCole & Raised 1 19891 ; iHaiman & HuHbOOll ; 
iMartini & WeinberdlOoif) ^ to be constrained. The small-scale clus- 
tering of QSOs can shed light on their triggering mechanism, and 
on the nature of QSO progenitors. 

With the advent of large, well-characterised samples, 
QSOs can now be ef ficiently photome t rically classified (e.g. 



spectroscopic analy si s jPo rciani. Magliocchetti & Norber d |2004| 
^Croom et alj 2005 ; IPorciani & Norbergj i2006i; iHennawi et al 



2006; Shen et al. 



20071; Ida Angela et al.ll2008l ; iMvers et alj|2008h . 



.. ly 

Richards et all |2004 iD'Abrusco elal] I2OO9I ; [Richards et al] 
2009allbh but still hav e quite imprecise pho tometric redshift s (e.g . 



zUU^allM) but stilt nave quite imprecise pnotometnc redsmits (e.g . 
_Budavari et al. '200 1'; 'Richard s et alltoOll ; IWeinstein et"ai] 12004 
Ball et al.. 2008). This suggests that an estimator that takes full 



but all such analyses are limited by the extremely low number 
density of objects with spectra. Higher number densities of 
objects can be achieved by using photometric QSO selection 
jMvers et alJbOOd l2007allbl) but systematic errors must be care- 
fully controlled because photometric redshifts for QSOs are still 
frequently inaccurate. The use of cross-correlation s to measure 
QSO clustering has thus proven q ui te popular (e.g. iCroom et al 



2004; lAdelberger & Steidelll2005alfg;ISerber et alj|2006l; ICoil et al 
12007 ; 'strand, Brunne r & MversI |2008|; IPadmanabha n et alj|2009l; 
[Mountrichas et al. 20()9|). Our new techni que build s on such 
approaches, particularly that of IPadmanabhan et af] ( |2009|) . by 
incorporating new information from photometric PDFs to improve 
the clustering signal. 

We note that, although we choose QSOs as our illustrative 
data set, our methods and results are significantly more general 
and our optimal estimator will improve the signal for any real- 
space clustering measurement that uses photometric redshifts. Al- 
though the methods developed in this paper can be easily applied 
to any spectroscopic-photometric cross-correlation measurement, 
they will be of particular use in upcoming surveys where sparse 
spectroscopic data (e.g., from BOSS), is embedded in deeper pho- 
tometric data, such as from PanSTARRS, DES and the LSST 

The outline of the paper is as follows. Sj2| introduces our 
new optimal spectroscopic-photometric cross-clustering estima- 
tor. In fj3] we introduce the QSO data we use as an example, 
and in ij4| we present the clustering results of this sample and 
use it to demonstrate the improvement our new technique pro- 
vides over existing estimators that do not utilise the full PDF. 
We finish in Sj5]with some conclusions and lessons learned. We 
assume a ACDM cosmological model with f2m = 0.25 and 
Q.A = 0.75, consistent with t he maximum l i keliho od estimates 
from the 5-year WMAP data ( iDunklev et al] |2009[) . All quoted 
magnitu des are corrected for Galactic extin ction using the dust 
maps of lSchlegel, Finkbeiner & Davis! ( Il998h . 



2 METHODOLOGY 

2.1 Real Space Clustering Measurements with Pliotometric 
Objects 

Imagine we have a set of objects for which multi-band photometry 
has allowed us to estimate photometric redshifts and a second (pos- 
sibly disjoint) set of objects for which spectroscopic redshifts are 
available. For the spectroscopic objects we know (up to small un- 
certainties due to peculiar velocities and uncertainties in the back- 
ground cosmology) a physical distance to each object, which can be 
used to anchor the physical scale. Consider the cross-clustering be- 
tween the set of objects with known spectroscopic redshifts and the 
set of objects for which only photometric redshifts are known. To 
begin let us assume that the spectroscopic objects all lie at a single 
redshift (and hence distance, x*) ^nd relax this assumption later. 
We may esti mat the correlatio n function using the DD/DR es- 
timator (e.g. lShanks et aljl983h 



advantage of the information in a photometric redshift might be 
expected to dramatically improve measurements of the clustering 
of QSOs. Most previous work on QSO clustering used purely 



^ More complex estimators, such as that of lLandv & Szalavl jl993l) . could 
also be used. One would simply substitute each estimator into Eq. (TT) or 
(13) evaluating the Rs{x*S) terms at different angular positions but at the 
comoving distance of the spectroscopic data point. We prefer the robustness 
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where we are measuring the cross-clustering of pairs of spectro- 
scopic and photometric objects, "D" denotes a data point "R" de- 
notes a point drawn from a random catalogue that mimics the data 
distribution and the subscripts "p" and "s" denote "photometric" 
and "spectroscopic". The factor Nr/N'^'^°^ scales the counts ap- 
propriately if the random catalogue has a different size than the 
photometric catalogue. We denote the random points Rp both to 
specify that the random distribution mimics the photometric data 
and to distinguish the term from R = x*^^ the transverse sepa- 
ration. Note that Eq. only requires knowledge of the angular 
selection function, or "mask", of the photometric data, not the typi- 
cally far more complex selection function of the spectroscopic data. 
We have labeled this estimator we{R) because it looks like a nor- 
mal angular correlation function in the photometric sample, except 
that angles have been converted to distances using the distance to 

the spectroscopic pa rtner. 

As detailed in IPadmanabhan et al] dlOOSl) we infer the pro- 
jected, real-space, cross-correlation function, Wp(J?), under the as- 
sumption that the clusteri ng is constant across the redshift slice and 
within the lLimbe3 ( Il953[) approximation, using the relation 



fix*) J dAxaR,x-x*) 
f{x*)MR) , 



(2) 



(3) 
(4) 



where f{x) is the normalised radial distribution function of the 
photometric objects with J f{x)dx = 1 of the spectro- 

scopic objects lie at x*- Note that this is a real space measurement 
and for broad enough f{x) we can use the real-space correlation 
function in the integral, avoiding the need to model redshift-space 
distortions. Also note that we are making use of the fact that /(x) is 
typically almost constant across the entire line-of-sight range of in- 
tegration employed in defining Wp. If this is not true then a more so- 
phisticated analysis, which factors in the changing selection func- 
tion of "random pairs" with distance, is required. 

For a distribution of spectroscopic redshifts one replaces 
fix*) in the above with the average, (fix*)}, across the spectro- 
scopic distribution. For a small spectroscopic bin (xi < X < Xa) 
the redshift distribution will typically be flat. In this case, (fix)) 
tells us the fraction of objects in the photometric data set that gen- 
uinely have redshifts in the spectroscopic bin of interest (fz) per 
comoving interval ({/(x*)) ~ fz/{X2 - xO- 

We can use Eq. ^ to answer the question: how large does a 
photometric sample need to be before a photometric-spectroscopic 
cross-clustering measurement can compete with a spectroscopic 
auto-correlation? Clearly, clustering estimates using photometric 
objects will improve as photometric redshift precision (and accu- 
racy) approaches the level of a spectroscopic redshift (though in 
this limit our assumption of constant f{x) breaks down). In the 
limit that the objects of interest are rare enough that their clustering 
is dominated by Poisson shot-noise, then the angular bins in we (R) 
are independent and 



Swg 

1+we 



-1/2 



^ = rl+^ A.- (5) 



where A'^pair is the number of data pairs in the bin and / is 
(fix*)) for the photometric sample. Note that both and Wp 
have dimensions of length. Eq. (|5} neatly shows the main draw- 
back of spectroscopic-photometric cross-correlation measurements 
as compared to auto-correlation measurements using only spec- 
troscopic objects. If the photometric redshift solutions are signif- 
icantly extended along the line-of-sight then fi is small (perhaps as 
low as the reciprocal of the depth of the survey). This suppresses the 
measured clustering, wg, which for a given sample is proportional 
to /. A very large number of pairs are thus necessary to measure 
Wg with any precision. 

How large is the typical suppression? When measuring the 
spectroscopic auto-correlation the clustering is integrated along the 
line-of-sight to eliminate the effects of redshift-space distortions. 
The limits of integration tend to vary from author to author but 
typically the line-of-sight interval is 0(100 Mpc). In the lan- 
guage of Eq. l|5j such an auto-correlation estimate can approach a 
limit of / ~ 0.01 Mpc^^. If the photometric sample is extended 
over, say, 1 h~^Gpc, then / = 0(10""^ h Mpc~^), and the number 
of photometric objects needs to be larger by a factor of ~ 100 in 
order to measure the clustering as well as if precise redshifts were 
known. If the extent is 500 Mpc one needs ~ 25 times more 
objects, and for 300/i^^Mpc one needs ~ 10 times as many. Of 
course, if obtaining spectroscopy or improved PDFs for the photo- 
metric sample is unrealistic then one has no other choice but to use 
the existing information. 



2.2 An Optimal Estimator for Real-Space Clustering using 
Photometric Redshifts 

We have noted two major drawbacks to measuring the real-space 
clustering of photometrically classified objects around spectro- 
scopic objects. First, it is not clear how to establish which photo- 
metric objects should be cross-correlated with a given set of spec- 
troscopic objects. The typical approach would be to use objects 
with a peak photometric redshift solution in the redshift bin of in- 
terest. This, however, discards much of the information codified in 
the photometric redshift PDF and ignores the fact that an object 
with a peak photometric redshift in the range of interest may actu- 
ally have less chance of being in that redshift range than an object 
with a peak photometric redshift beyond that range, particularly 
as the peak of the PDF may itself be poorly defined. We illustrate 
this in Figure[T] The second drawback is the possible extension of 
the ensemble of the photometric redshifts along the line-of-sight, 
which causes / to be small in Eq. (O. 

We now introduce a new method designed to circumvent these 
issues. Consider breaking the photometric sample into very thin 
slices in photometric redshift, Zp, and labelling the slices from 
i — 1, ■ ■ • , fc. Each photometric sample, i, provides an estimate 
of Wp{R) via wg{R)/ fi. Writing this estimate as Wi{R), with an 
error proportional to f^^ N^^^l"^ in the limit of weak clustering, we 
can inverse variance weight the different measurements to obtain 



Wp{R) 



Nr°\ftw,{R) 



(6) 



of Eq. (T) to likely inaccuracies in the spectroscopic "mask" over, e.g., the 
reduced variance of the Landv & Szalay ( 1993) estimator. 



where Nf^°^ is the number of photometric objects in sample i. 
This circumvents the issue of which photometric objects to cross- 
correlate against a set of spectroscopic objects in a chosen bin of 
redshift. Clearly photometric samples which peak at very differ- 
ent redshifts from the spectroscopic sample are significantly down- 
weighted in the sum. Note that our method also down-weights both 
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Figure 1. In analyses that use the PDF peak, only the PDF in the centre 
panel (Zpoak = 2.17) would be considered to overlap the spectroscopic 
bin of interest (1.8 < Zspec < 2.2 in this plot). In reality each PDF has 
a 50% overlap with the spectroscopic bin. We illustrate some typical prob- 
lems with using PDF peaks; PDFs that overlap the spectroscopic bin but 
have a preferred peak solution far from the bin (a "catastrophic" redshift; 
upper panel), PDFs with a peak solution in the bin but that are smeared 
out across a large range of redshifts (centre panel), and well-defined PDFs 
that lie just outside the bin of interest (lower pane l). The PDFs are for real 
photometric QSOs calculated using the method of lBall et al.l )2008h . 
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Figure 2. The calculation of (/(x*)) ™d fi, the "comoving overlaps", in 
units of 10^3 Mpc-i. The upper panel demonstrates the old method 
( ^2. It , in which the photometric redshift PDFs are combined into (/(x*)) 
an ensemble, normalised, comoving distiibution averaged over the spectro- 
scopic bin of interest (1.8 < Zspec < 2.2 in this plot). The lower panels 
demonstrate our new bin-weighted estimator (Eg. II It in which each PDF is 
transformed into a normalised comoving distribution and averaged across 
the bin of interest /i, /21 /3---/fe. The lower panels displays the case for 
jyptiot _ jgj j^^j number A'^P'^°' of PDFs can be combined 

into an ensemble. 



objects with unusual colours that might have multi-peaked PDFs 
and objects with poorly constrained photometry, such as near sur- 
vey limits, where the PDF might be very broad. 

Since the binning is so far arbitrary we can consider the limit 
where each slice in Eq. ^ represents a single photometric object, 
i.e. Nf^°^ — 1 for each i. In this case photometric objects that have 
some overlap with the spectroscopic bin of interest are included 
in the sum and photometric objects with zero overlap have zero 
weight. Treating the photometric objects individually, rather than 
in an ensemble, removes the need for any arbitrary binning and 
effectively reduces the extension of the ensemble PDF along the 
line-of-sight and should thus significantly improve the clustering 
signal-to-noise. 

Because the weights in Eq. ^ are = Nf^^°^ a rough 
determination of how much this new estimator will improve the 
signal-to-noise of a Wp estimate over existing methods, which only 
consider objects that have a peak photometric redshift in the bin of 
interest is 



(V) 



where the i subscripts represent our new optimal estimator for a 
slice containing A'^p'^°* photometric objects and the n represents 
the number of photometric objects with a PDF peak in the spec- 
troscopic bin of interest. The fi are the comoving fractional pho- 
tometric redshift overlaps for objects in slice i and (fix*)) is the 
same for the ensemble of photometric objects with a peak photo- 
metric redshift in the spectroscopic bin of interest. This is illus- 
trated in Figure [2] in which the upper panel plots the ensemble of 
the (n = 110410) PDFs with 1.8 < Zpoak < 2.2. This ensemble 
has an {fix*)} = 1-26 x 10"^ h Mpc^^ overlap with the true 
range 1.8 < z < 2.2. The lower panels plot three individual (i.e. 



phot 



phot 



N. 



ptiot 



1) PDFs and their overlaps with 



< z < 2.2. 



2.3 The Optimal Estimator in Practice 

In Sj4] we illustrate the degree to which our optimal estimator can 
improve clustering estimates for a "typical" analysis, using a sam- 
ple of spectroscopic and photometric QSOs. QSOs may be particu- 
larly well suited to our estimator as they are rare enough that their 
clustering is dominated by Poisson noise (e.g., see Figure|4} out to 
reasonably large scales and /(x) is quite broad. We note, though, 
that our optimal estimator should improve the signal-to-noise for 
any photometric clustering analysis. The exact methodology we use 
in practice is as follows. Eq ^ can be rewritten as 



Wp{R) = ^ CiWi{R) 

i 

where 



(8) 



(9) 



and we have used Wp — wg/ fi . Now, consider substituting Eq. (TJ, 
the typical DD/DR estimator for w{9), into Eq. ((Sj 



■wp{R) = y Ci 



Nr DsDpjR) 
DsRpiR) 



N. 



phot 



1 



(10) 



where the the transverse separation, R, is evaluated using the angle 
between a spectroscopic-photometric pair and the distance to the 
spectroscopic object. Finally we obtain a simple equation for calcu- 
lating the real-space clustering of a sample of photometric objects 
with full PDFs around a sample of spectroscopic objects 

DsDp{R) 



Wp{R) = Nr 



phot 



DsRp(R) 



(11) 



The ° factor reflects the fact that care must be taken to 

weight the random catalogue correctly, i.e., on a slice-by-slice ba- 
sis. Note that ^ Ci ~ /~^(x*) approximates the reciprocal of 
(fix*)) from the unweighted estimator. We prefer Eq. i ll It to other 
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2.0 
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Imp. 


1.87 


1.61 


1.22 


1.63 


1.53 


1.40 


1.77 


1.90 


(Imp.)2 


3.50 


2.60 


1.48 


2.65 


2.35 


1.96 


3.15 


3.63 



Table 1. "Imp." is the expected improvement due to our new method 
(Eg. II It over the old ensemble approach ( i|2.U as characterised by Eq. (7). 
As this value approximates the improvement in Poisson noise, its square 
approximates the equivalent increase in survey size. 



3 DATA 

Although our main resuU is the new methodology outlined in ^ 
in fj4]we will illustrate our new method with real-world samples to 
demonstrate the improvements that it can return. We will make use 
of quasars selected from the SDSS, as described here. 



Figure 3. The calculation of fij, the "comoving overlaps" for the pair- 
weighted approach of Eq. (13). A comoving window (Ax it 100 Mpc 
in the case of this plot) is adopted around each spectroscopic QSO, which 
are indexed j . There will be many spectroscopic QSOs in a given redshift 
bin of interest but here we plot only two at 2 = 1.90 and z = 2.19 for 
illustrative purposes. Each photometric PDF, indexed i, is then averaged 
across each of the comoving windows to produce pairs of weights fij . We 
display the case for ArP'^°' = Ar=P'='= = 1 in Eq. {12) but any number 
^piiot pj- p£)ps ajjd jsjspci: spectroscopic slices can be combined into 
ensembles. 



versions of this expression as it facilitates simple tracking of the 
data-data counts to construct error estimates from subsampling of 
the counts. 

Finally, we note that one can express the weights in Eq. ([9]l 
based on overlaps between each individual spectroscopic and pho- 
tometric object (i.e. weighting fully by pairs rather than by how 
much a photometric object overlaps a bin of many spectroscopic 
objects) without loss of generality. The equations of interest would 
then reduce to 



phot 



jySpCC 



E 

1,3 



^^i '-^ j Ji,j 



(12) 



where A'^^p"'^ is the number of spectroscopic objects in slice j. We 



will choose N^'^'^'^ — 1 (as well as A^^^ 
larly 



phot 



1) throughout. Simi- 



^ptiot^spec DsRp{R,Ax) 



(13) 



where Ns is the total number of spectroscopic objects analyzed in 
the spectroscopic bin of interest and Ax is the size of the comov- 
ing window integrated over around each spectroscopic object. The 
additional normalization of Ns arises by analogy with Eq. Jilt and 
the addition of new spectroscopic slices. The extent of the comov- 
ing window is entirely flexible, requiring some trial-and-error to de- 
termine the optimal choice, although Ax ~ C?(50-100 Mpc), 
as used when integrating out the spectroscopic autocorrelation to 
eliminate the effects of redshift-space distortions, is an obvious 
choice. This slightly enhanced approach should provide additional 
signal-to-noise gains over Eq. jilt provided the photometric PDFs 
are sufficiently sampled to accurately estimate their overlap with 
small comoving distance intervals. We illustrate this final, full pair- 
weighted approach in Figure[3] 



3.1 Photometric Quasars 

The photometric quasar sample that we analyze is constructed 
using the Kernel D ensity Estimation (KDE) technique of 
[Richards et al.l ([2004'), a technique to classify quasars in photo- 
metric surveys which draws on several innovations inherent to 
the SDSS (e.g.. lYork et alj 200(]|) - extensive and carefully m oni- 
tored ugriz imaging (e.g., lGunn et alJl998l : lHogg et al.l200ll) cal- 
ibrated to a stand ard photometric system (e.g.. lFukugita et al. [l996l: 
ISmith et al. !'20Q2) with a precision of a few-hudredths of a mag- 
nitude jlvezi c et al. 2004,). These innovations allow quasars to be 
more easily separated from the ste llar locus. We use the D R6 KDE 
sample, which is detailed in full in lRichards et alj ( l2009ah . 

The DR6 KDE sample is drawn fro m a test sample of all point 
sources in the SDSS DR6 imaging data jAdelman-McCarthv et al.l 
,2008.) with i < 21.3, wher e i refers to the asinh magnitude 
jLuDton, Gunn & Szalavlll999h in the "uber-calibrated" system of 
IPadmanabhan et alj 1 20081) . The DR6 primary imag ing data cov- 
ers an area of 8417 deg^ but further cuts (Myer s et al] l2006l : 
[Richards et ai]|2009ah remove approximately 150 deg^ or 1.7% of 
the area. 

In this paper we concern ourselves only with DR6 KDE ob- 
jects that have a very high probability of being QSOs. As such, we 
apply a uvxts=l cut within the sample. This cut selects QSOs at par- 
ticularly high efficiency by limiting the DR6 KDE sample to QSOs 
that would have been sel ected by traditional UV -excess techniques. 
As noted in Table 4 of [Richards etal] ( l2009ij) , and discussed in 
iMvers et aT (2006), only ~5% of the uvxts=l QSOs should, in re- 
ality, be starfl The UV-excess nature of the uvxts=l cut limits the 
spectroscopic redshift range to 0.8 ^ z 2.4. 



3.1.1 Redshift Distribution of Photometric Quasars 

While estimating the redshift of a QSO with a large number of 
narrow filters can be precise (e.g., iHatziminaoglou et al.l l2000l : 
IWolfetalJ 12001, 2003) results using broadband filters are more 
mixed (e.g., Richards et al. 2001; Budavari et al. 2001)- Although 
photometric redshifts are often expressed as a single value, they 
are, in reality, probabilistic, with a full probability density function 
(or PDF) representing the possible redshifts the object of interest 
could occupy given the filter information. Our main goal in this pa- 
per is to incorporate full PDF information into clustering analyses. 
If we denote by (z) the probability density function for QSO j. 



^ [Richards et alj )2009al) advocate a good > cut to improve efficiency. 
We ignore this, as for uvxts=I it only discards a further 2.4% of the data. 
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Figure 4. The ratio of the bootstrap error to the Poisson error for the old, 
ensemble method of ^2.11 We plot three separate realizations to demon- 
strate that the error is stable to ~ 1% for 10,000 bootstr'aps. The bootstrap 
error tracks the Poisson error to around 6%. On scales < 0.5 Mpc, 
where there are few QSO pairs, 10,000 bootstraps is insufficient to recreate 
the shot noise. On scales > 20 Mpc, where QSO pairs are not inde- 
pendent, Poisson errors underestimate the true error This plot demonstrates 
that bootstrapping (at N=10,000) and Poisson errors agree well in the range 
0.5 < < 20 fe-l Mpc. 



and assume J P^{z)dz = 1 across all possible redshifts, then the 
value that will interest us is the fraction of the ensemble PDF that 
will genuinely lie in any redshift interval zi < z < Z2 



]\[pho 



E 



dzP^iz) 



(14) 



This fraction can be deduced for arbitrary redshift intervals and 
could correspond to a single photometric QSO (A'^p''"' — 1) hav- 
ing, say, a 60.3% chance of lying in the redshift range of interest, 
or equivalently a sample of 100 PDFs in an ensemble from which 
we might derive that 60.3 of the 100 QSOs in the ensemble can be 
expected to actually lie in the interval of interest. 

We obtai n our PDFs u sing the Nearest Neighbour approach 
outlined in B all et al.l jZOOSh . We perturb a QSO's colours relative 
to a spectroscopic trainin g set drawn from the DR5 QSO sam- 
ple ( ISchneideretal1 [2007h. determine the nearest neighbour over 
100 perturbations, and build a function that describes the probabil- 
ity that the photometric quasar matches near spectroscopic neigh- 
bours Examples of these PDFs are shown in Figures[T]and|2l 



3.2 Spectroscopic Quasars 

We cross correlate the above QSOs with a sample of spectroscopic 
QSOs dra wn from the DR6 QSO sample (Schneider et al. 2009 
in prep, see lSchneider et alll2007h . Our spectroscopic QSO sample 
populates the sky in a complex manner but for our method, only the 
distribution of the photometric sample, which is far simpler, needs 
to be modeled. 

We impose the criterion that our spectroscopic QSOs must 
also appear in the photometric sample discussed in i]3.1l We make 
no additional cuts on flags or redshift quality, as the vast majority of 



^ Our PDFs for the DR6KDE catalog wiU be made available at 
|http : //Icdm. astro . uiuc ■ edu/nbckdejdr 6-pdf s| 



quasar redshi fts are reliable i f the obj ect is, indeed, a QSO, and the 
cuts made by [Richards et al.l j2009ah help ensure both the quality 
of the photometry of the QSO, and the likelihood that it is a QSO. 



4 EXAMPLE IMPLEMENTATION OF THE NEW 
OPTIMAL ESTIMATOR 

In this section, we apply the method developed in [|2]to the spec- 
troscopic and photometric QSO samples discussed in ^to illus- 
trate both our new methodology and its statistical gains over current 
methods. As our goal is a simple demonstration of our new method- 
ology, we apply no cuts to the samples beyond those discussed in 
fj3] This ensures that any improved signal is due to the method it- 
self, rather than any additional magnitude, colour or redshift cuts 
that we might impose. As outlined in Sj3] the only significant cut 
we employ is the uvxts=l cut within the photometric sample. This 
cut, which is purely to ensure that almost all of our photometric ob- 
jects are genuinely QSOs, limits our spectroscopic redshift range 
to 0.8 < z< 2.4. 



4.1 Expected Improvement in Signal 

Eq. l|7j allows us to estimate how treating each photometric QSO's 
PDF individually (i.e. Eq. lilt will improve the clustering signal 
over treating the photometric QSOs as an ensemble (as discussed 
in ^2.i\ . In Figure|2]we demonstrate the calculation of {/(x*)) for 
two different approaches; the ensemble approach of W2.\\ and our 
new bin- weighted approach (Eg. lilt, which treats each fi individ- 
ually. In Table [T] we show the expected improvement implied by 
Eq. ^ for a range of spectroscopic redshift bins. This improve- 
ment arises from using all of the information inherent in every 
PDF for every individual photometric object and is about a fac- 
tor of ~ 1.6 X. Based on Poisson statistics, simply using our new 
approach should be roughly equivalent to having a ~ 2-3 x larger 
survey. 



4.2 Actual Improvement in Signal 

Poisson errors are typically used to calibr ate the noise in a cluster- 
ing estimator (e.g.. lLandv & Szalavlll993h 



Awe (7?) = 



l + we{R) 

y/DsDp{R) 



(15) 



Poisson errors accurately reflect the clustering noise on small scales 
(where many pairs remain independent) and remain very accurate 
for the photometric sample being used out to at least 20 Mpc 
(e.g., consider deprojecting Figure 1 of lMvers et alj2006h . Poisson 
errors are more complex to calculate for our new methodology be- 
cause we incorporate pairs of points with unequal weights, some 
that may be completely outside the spectroscopic bin of interest, 
but they can in principle be compute d. However we estim ate the 
errors by simply bootstrapping (e.g.. lEfron & Gon3 1983) on the 
individual spectroscopic QSOs, as was done in Padmanabhan et al] 
(2009). This approach is additionally useful as it demonstrates how 
one might estimate errors for our new approach based on other re- 
sampling approaches, such as jackknifes or field-to-field variations. 
Resampling approaches are generally more accurate than Poisson 
errors on large scales and facilitate the construction of a full co- 
variance matrix. Our preferred expressions for our new estimators 
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Figure 5. Wp{R) as measured by the old, ensemble estimator (diamonds; 
Eq.|4) and our new bin-weighted estimator (crosses; Eq. ll U and full pair- 
weighted estimator (triangles; Eq. ll3t . The pair-weighted estimator for this 
plot used a comoving window of ±50 Mpc. All plotted data are for 
QSOs with spectroscopic redshifts in the bin 1.8 < Zspcc < 2.2. We fit a 
7 = 1.5 power law over 1.6 < i? < 40 Mpc to each estimate us- 
ing the full covariance matrix estimated from 10,000 bootstraps. The points 
have been offset slightly for display purposes. The best fit value of the co- 
moving scale length ro (see Eq. ll6t is displayed for each data set, together 
with the (la) error on the fit. 



(Ea.llllandll3l) make it straightforward to track how each spectro- 
scopic QSO affects the pair counts and quickly construct resampled 
error estimates. 

In Figure |4] we plot the relationship between the Poisson and 
bootstrap errors derived for the ensemble estimator (i.e., derived 
using only QSOs with peak PDF solutions in the spectroscopic 
bin of interest, as discussed in ^2.\\ using a spectroscopic bin of 
1.8 < Zs < 2.2. Across scales of 0.2 < R < 50 Mpc the 
bootstrap errors converge to within ~ 0.8% for 10,000 bootstraps, 
and the amplitude of the bootstrap errors closely tracks (within 
~ 5-10%) that of the Poisson errors. This demonstrates that boot- 
strapping on the spectroscopic QSOs is close to equivalent to using 
Poisson errors on the scales of interest. On scales < 0.5 Mpc, 
where there are few QSO pairs, more bootstrap samples are likely 
needed to recreate the precision of the Poisson errors. On scales 
> 20 Mpc the Poisson errors likely begin to underestimate 
the noise as covariance increases. 

Having demonstrated the validity of bootstrapping to obtain 
estimates of the noise we plot the results for the old ensemble ap- 
proach, our new bin- weighted estimator (Eg. lllb and our full pair- 
weighted estimator (Eq.|13t in Figure|5] To summarise our results 
we fit power laws to our data. A power-law 3D correlation func- 
tion of the form ^(r) = (r/ro)^"' produces a power-law projected 
correlation function 



Wp{R) _ 0Fr[(7-l)/2] 



R 



r[7/2] 



(16) 



We fit this form to the measured correlations over the range 
1.6 < i? < 40 h-^ Mpc, using the full bootstrap covari- 
ance and holding the index fixed at 7 = 1.5. In order to im- 
prove the numerical stability of this procedure, we scale Wp(R) 
by R^^^, thereby removing the artificially high condition number 
that arises due to the large dynamic range of Wp. The power-law 
fit for the old, ensemble, approach gives ro — 4.20 ± 0.88, our 
new bin-weighted estimator (Eq. lilt gives ro — 4.22 ± 0.65 
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Table 2. Improvement of our new bin-weighted estimator (Eq. lilt over 
the old methodology of ^2.11 Each column represents a bin width of 0.4 
in (spectroscopic) redshift centred on 2. The scales in the first column are 
logarithmic at five-per-decade. Table values are the ratio between jackknife 
eiTors for the new to the old estimator (o-now/fold)- The final row is the 
total improvement over 1 < < 20 Mpc. Squaring the table values 
approximates the equivalent increase in survey size obtained by using our 
estimator. 



(2a) and our full pair-weighted method (Eq. I13t gives ro — 
4.56 ± 0.48 (2a), which agree well with numerous recent esti- 
mate s of the amplitude of Wp for QSO clus t ering near z ~ 2 
(e.g.,LP orciani, Magliocchetti & Norberg''2004'; ' Croom et alllOOSi ; 
IPorcia ni & Norberg 2006; da Angela et al. 2008). We list 2a errors 
to reflect the fact that our errors are likely underestimated on large 
scales but the relative improvements for our new estimators are 
identical whether we quote la or 2a errors. 

It is clear from the fits and errorbars in Figure [5] that our new 
bin-weighted estimator (Eq. lilt , which utilises all of the redshift 
information in the PDF not just the peak of the PDF, consider- 
ably improves the signal-to-noise in estimates of ■Wp{R). In Ta- 
ble |2] we list the improvement in signal-to-noise as a function of 
redshift and scale for our sample. Our new bin-weighted estimator, 
across scales that are typically used to represent the quasi-linear 
regime of clustering (1 < i? < 20 Mpc) improves the signal- 
to-noise of clustering estimates by 30%. Adopting our most basic 
approach of incorporating full PDFs into a clustering measurement 
is thus equivalent to increasing the size of the photometric sample 
discussed in i]3.I| bv 60%. Photometric redshift determinations for 
QSOs in broadband ugriz are particularly poor outside of the range 
1 < z < 2. Outside of this range, the improvement yielded by our 
bin-weighted estimator is slightly larger, equivalent to increasing 
the survey size by 80%. 

We note that our improvements in Table[2]are slightly smaller 
than the expected improvements listed in Table [T] This could re- 
flect a breakdown in our assumption of Poisson errors or innaccu- 
racy in our PDFs. In fact, one novel approach of our methodology 
would be to tune the PDFs until the figures in Table|2]peaked, thus 
constructing PDFs w ithout using any colour information (see also 
ISchneider etai]|2006h . 

In Table[3]we list the improvement in signal-to-noise as a func- 
tion of scale using our full pair-weighted estimator (Eq. |13t for a 
spectroscopic redshift bin of 1.8 < z < 2.2. We adopt a repre- 
sentative range of comoving windows (see the discussion of Ax ~ 
(9(50-100 Mpc) near Eq.|13t. The improvement in signal-to- 
noise is about a factor of 2 for scales that are typically used to repre- 
sent the quasi-linear regime of clustering (1 < ii < 20 ft^^ Mpc). 
Across some scales the improvement in signal approaches a fac- 
tor of 2.2 X for a comoving window of Ax = ±50 Mpc. 
Impressively, this means that our full pair-weighted estimator can 



© 0000 RAS, MNRAS 000,[T]{9] 



8 A. D. Myers et al. 



R 


Eq. (11} 


Eq. (13); Ax in h' 


^ Mpc 


( h-i Mpc) 




±200 


±100 


±50 


0.8 


1.39 


1.41 


1.76 


2.03 


1.3 


1.35 


1.39 


1.80 


2.10 


2.0 


1.39 


1.43 


1.79 


2.10 


3.2 


1.38 


1.44 


1.81 


2.16 


5.1 


1.36 


1.42 


1.76 


2.05 


8.2 


1.34 


1.42 


1.79 


2.16 


12.9 


1.35 


1.39 


1.77 


2.11 


20.5 


1.33 


1.34 


1.70 


1.99 


10.5 


1.33 


1.34 


1.68 


2.04 



Table 3. Improvement of our full pair-weighted estimator (Eq. ll3t over the 
old methodology of i|2. H and our binned estimator (Eq. ll U . Each calcula- 
tion is over a spectroscopic bin of 1.8 < z < 2.2. Table values are the 
ratio between jackknife errors for the new estimators as compared to the 
old estimator (o-now/cold)- For the full pair- weighted estimator (Eq. |13) 
the columns are the adopted comoving window around each spectroscopic 
QSO. The equivalent window for Eq. (TT) would be 220 Mpc, 
corresponding to the full bin 1.8 < z < 2.2. The final row is the total 
improvement over 1 < ij < 20 Mpc. Squaring the table values 
approximates the equivalent increase in survey size obtained by using our 
estimators. 

potentially improve clustering by a factor equivalent to increasing 
the size of a survey by a factor of 4-5. 

The improvements in Tables |2] and [3] demonstrate that the 
PDFs we use must carry additional information that can be used 
to improve clustering signal, which was the main goal of this pa- 
per. In future, as our knowledge of PDF construction is refined, the 
improvements facilitated by our method can only also improve. 



5 CONCLUSIONS 

We have introduced new correlation function estimators to improve 
measurements of how photometric objects cluster around spectro- 
scopic objects. Spectroscopic-photometric cross-correlations have 
known benefits, due to the spectroscopic objects having narrowly- 
defined distance information and the photometric objects having 
significantly higher number densities. Our approach uses the full 
photometric probability density information, or PDFs, to optimise 
such cross-correlation estimates in the Poisson limit. We note that It 
is possible that a strict Poisson weighting for pairs can be improved 
upon, particularly on moderate scales. 

We have additionally provided simple equations that can be 
used to calculate when our new estimators will improve on mea- 
surements from the spectroscopic autocorrelation. The parameters 
of interest are the overlap of the photometric data with the spectro- 
scopic bin in comoving space, which depends on the PDF precision, 
and the relative number of photometric and spectroscopic objects. 
Because the number of photometric objects scales as the square 
of the the comoving overlap it can be difficult for spectroscopic- 
photometric cross-correlations to improve on spectroscopic auto- 
correlation estimates. 

Our improved estimator has several benefits over existing 
cross-correlation methods. Most obviously, because our estima- 
tor does not solely rely on the "peak" of a photometric object's 
PDF to determine which photometric objects should be cross- 
correlated against the spectroscopic objects of interest, the infor- 
mation from more photometric objects is used in clustering esti- 
mates. We show that, in the case of photometric QSOs, simply us- 
ing the bin-weighted form of our estimator (Eq. lilt can thus im- 
prove signal-to-noise in the Poisson limit in a manner equivalent 



to obtaining almost 2 x as much survey data. Eq. Q suggests that 
the full gains on all scales may be closer to equivalent to obtaining 
3 X as much survey data. Indeed, our full pair-weighted estimator 
Eq. J13t demonstrates that gains equivalent to increasing survey 
size by as much as a factor of 4-5 can be realised. Although we 
have specifically used the example of QSOs, we stress that our es- 
timator can and should be used to improve the signal for any real- 
space clustering measurement using photometric redshifts. 

The current incarnation of our method has several shortcom- 
ings. If the PDFs peak sharply relative to the spectroscopic red- 
shift distribution then /(x) cannot be validly extracted, and the 
full integration across Eq. (2) must be applied. Our assumptions 
similarly break down if the spectroscopic survey selection func- 
tion varies rapidly across the redshift bin of interest. In these cases 
the full 2D correlation function must be integrated in the line-of- 
sight direction. These inadequacies cannot be countered by nar- 
rowing the spectroscopic bin indefinitely, as redshift-space distor- 
tions ultimately limit the scale where redshifts map to line-of-sight 
distances. As such, our assumptions are most robust for the pair- 
weighted methodology of Eq. ( I13t . In this pair- weighted approach, 
a strict spectroscopic window of, say, ±50 h^^ Mpc can be en- 
forced, and our assumptions would then be valid until the PDFs are 
more precise than ±50 h^^ Mpc or the spectroscopic distribution 
varies rapidly over ±50 /i"^ Mpc. 

A particular benefit of our estimator is that it can, very sim- 
ply, incorporate every photometric object into an analysis, negating 
the need to bin the photometric objects. PDFs of varying preci- 
sion from a range of photometric data can thus be simply com- 
bined in a single measurement, provided the mask of photomet- 
ric object detections is well-controlled. One could thus envisage 
taking, say, multi-wavelength photometry from patchy space tele- 
scope data or a range of small dedicated surveys (to improve PDFs 
where possible) embedded in uniform optical photometry such as 
the SDSS (to establish detections of the photometric objects of in- 
terest), and straightforwardly cross-correlating this complex pho- 
tometric data with a completely different spectroscopic data set. 
Further, there is no reason to limit the probabilistic information to 
a photometric redshift. Many techniques, such as star-galaxy sep- 
aratio n or the star-QSO sepa ration technique we have used in this 
paper ^Richards et alj2009al) , provide classification probabilities as 
well as photometric redshifts. Such classification probabilities can 
naturally be incorporated into our method by, e.g., weighting a PDF 
heavily to jz = if an object has a high probability of being a star. 

Because of the flexibility of our estimator, it should be useful 
anywhere on the sky where spectroscopic data is embedded in deep, 
potentially complex and multi-wavelength, photometric data. This 
should make our estimator particularly useful for regions of the sky 
where extensive spectroscopy, such as from BOSS, the various 2dF 
surveys and the SDSS, is embedded in deep, well-calibrated pho- 
tometry, with measurable PDFs such as from Pan-STARRS, DES 
and the LSST. Over the next decade, we expect that obvious appli- 
cations of our estimator will include improved measurements of the 
clustering of photometric LBGs, LRGs and QSOs around spectro- 
scopic QSOs and measuring the clustering of photometric galaxies 
and QSOs around absorption features in QSO spectra. 
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